Search Result

Select

Analysis of large-scale distributed machine learning systems: a case study on LDA

TANG Lizhe, FENG Dawei, LI Dongsheng, LI Rongchun, LIU Feng

Journal of Computer Applications 2017, 37 (3): 628-634. DOI: 10.11772/j.issn.1001-9081.2017.03.628

Abstract （926）

PDF （1169KB）（569）

Save

Aiming at the problems of scalability, algorithm convergence performance and operational efficiency in building large-scale machine learning systems, the challenges of the large-scale sample, model and network communication to the machine learning system were analyzed and the solutions of the existing systems were also presented. Taking Latent Dirichlet Allocation (LDA) model as an example, by comparing three open source distributed LDA systems-Spark LDA, PLDA+ and LightLDA, the differences in system design, implementation and performance were analyzed in terms of system resource consumption, algorithm convergence performance and scalability. The experimental results show that the memory usage of LightLDA and PLDA+ is about half of Spark LDA, and the convergence speed is 4 to 5 times of Spark LDA in the face of small sample sets and models. In the case of large-scale sample sets and models, the network communication volume and system convergence time of LightLDA is much smaller than PLDA+ and SparkLDA, showing a good scalability. The model of "data parallelism+model parallelism" can effectively meet the challenge of large-scale sample and model. The mechanism of Stale Synchronous Parallel (SSP) model for parameters, local caching mechanism of model and sparse storage of parameter can reduce the network cost effectively and improve the system operation efficiency.

Reference | Related Articles | Metrics

Select

Optimization of cloud task scheduling based on discrete artificial bee colony algorithm

NI Zhiwei, LI Rongrong, FANG Qinghua, PANG Shanshan

Journal of Computer Applications 2016, 36 (1): 107-112. DOI: 10.11772/j.issn.1001-9081.2016.01.0107

Abstract （505）

PDF （1066KB）（436）

Save

To meet high quality requirement of virtual resource service in cloud computing applications and solve the problem that cloud computing task scheduling only consider single objective currently, a Discrete Artificial Bee Colony (DABC) algorithm for cloud task scheduling optimization was proposed by considering the users' shortest waiting time, resource load balancing and economic principle. First, the multi-objective mathematical model of cloud task scheduling was established in theory. Second, by combining with preference satisfaction policy, introducing the local search operator and changing the searching way of scout bee, an optimizing strategy based on the Multi-objective DABC (MDABC) algorithm was proposed to solve the problem. Different cloud task scheduling simulation experimental results show that the proposed MDABC algorithm can obtain higher comprehensive satisfaction than the basic DABC algorithm, Genetic Algorithm (GA) and classical greedy algorithm. Thus, the proposed MDABC algorithm can better improve the performance of cloud task scheduling in virtual resource system, and its universality is better.

Reference | Related Articles | Metrics

Select

Improved ant colony optimization for QoS-based Web service composition optimization

NI Zhiwei, FANG Qinghua, LI Rongrong, LI Yiming

Journal of Computer Applications 2015, 35 (8): 2238-2243. DOI: 10.11772/j.issn.1001-9081.2015.08.2238

Abstract （490）

PDF （1051KB）（445）

Save

The basic Ant Colony Optimization (ACO) has slow searching speed at prior period and being easy to fall into local optimum at later period. To overcome these shortcomings, the initial pheromone distribution strategy and local optimization strategy were proposed, and a new pheromone updating rule was put forward to strengthen the effective accumulation of pheromone. The improved ACO was used in QoS-based Web service composition optimization problem, and the feasibility and effectiveness of it was verified on QWS2.0 dataset. The experimental results show that, compared with the basic ACO, the improved ACO which updates the pheromone with the distance of the solution and the ideal solution, and the improved genetic algorithm which introduces individual domination strength into the environment selection, the proposed ACO can find more Pareto solutions, and has stronger optimizing capacity and stable performance.

Reference | Related Articles | Metrics

Select

Simulink-based uncertain abnormal pattern recognition of quality control chart

HOU Shi-wang ZHU Hui-ming LI Rong

Journal of Computer Applications 2012, 32 (10): 2940-2943. DOI: 10.3724/SP.J.1087.2012.02940

Abstract （850）

PDF （559KB）（462）

Save

The control chart is in uncertain abnormal state when the plotted-point is close to the critical value, or the number of points is close to the prescriptive target, or there is concurrence of many abnormities. The traditional methods are hard to complete the pattern recognition. Considering the concurrence of trend pattern and cycle pattern, the original control chart signal was decomposed by wavelets. The different abnormal signals were reconstructed with appropriate wavelet coefficients. By curve fitting, the goodness of fit to the reconstruction wavelets was taken as the characteristic number of abnormal pattern. Then the occurrence degrees of uncertain patterns were calculated by inputting the characteristic numbers into membership function of corresponding patterns. The simulation model of this approach was developed under Matlab/Simulik. Finally, an application example was given and the result shows the feasibility of this approach.

Reference | Related Articles | Metrics

Select

Multiple ellipses detection based on curve arc segmentation of edge

Nan-nan LI Rong-sheng LU Shuai LI Yan XU Yan-qiong SHI

Journal of Computer Applications 2011, 31 (07): 1853-1855. DOI: 10.3724/SP.J.1087.2011.01853

Abstract （1159）

PDF （448KB）（743）

Save

In this paper, a new efficient algorithm for ellipse detection was proposed, which was based on edge grouping, different from standard Hough transform. Firstly, It separated edge boundary into different arcs at the intersections, divided those arcs into two categories: the long and the short and sorted the two categories at non-increasing sequence, then estimated the parameters of the ellipses using least square fitting method with arcs which may belong to the same ellipse; at last testified whether ellipses coming from the front steps are real ones. The method has been tested on synthetic and real-world images containing both complete and incomplete ellipses. The outcome demonstrates that the algorithm is robust, accurate and effective.

Reference | Related Articles | Metrics